Published August 30, 2023
By Advay Shindikar and Kimberly Mann Bruch, SDSC Communications
The Climate Informatics (CI) Conference 2023 (CI2023) was recently held at the University of Cambridge and aligned with the National Science Foundation-funded research coordination network, FAIR in Machine Learning (ML), AI Readiness, and AI Reproducibility (FARR), which is co-led by San Diego Supercomputer Center Research Data Services Director Christine Kirkpatrick.
“One of the exciting aspects of CI2023 was a Reproducibility Challenge, which was co-hosted by the Climate Informatics and Cambridge University Press & Assessment with support from Cambridge University, the Alan Turing Institute and Simula Research Laboratory,” said Kirkpatrick. “The challenge offered attendees an opportunity to collaborate with teams of two to four participants to create a notebook that reproduced the key contributions of a published environmental data science paper for eventual integration in the open-source Environmental Data Science (EDS) Book. FARR is lucky to have Douglas Rao in our leadership, who is often driving novel community engagement methods in impactful ways.”
The challenge was co-led by Yuhan Douglas Rao, FARR co-principal investigator and research scientist at North Carolina State University, and garnered approximately 30 global participants. The seven teams were tasked with creating notebook-based repositories that reimplemented the workflow of a published paper from the Environmental Data Science journal, which is published by Cambridge University Press. They verified and validated findings of the journal studies and then pinpointed potential improvements in data and code-sharing best practices. Three teams submitted complete notebook repositories currently in the process of being published in the Environmental Data Science book, an open-source initiative which seeks to promote the principles of FAIR (findable, accessible, interoperable, reusable) code and data across the earth and climate sciences.
The winning team consisted of researchers from University of Colorado - Boulder, UC Berkeley and Claremont McKenna College who reproduced the paper: A sensitivity analysis of a regression model of ocean temperature. After the challenge, participants shared their experiences in reproducing other researchers' work.
“Proper documentation of the code, data and computing environment is critical for reproducible data science — especially for research using large scale datasets — and, some unexpected factors may influence the reproducibility, such as the language used in documentation,” said Rao. “Participants of the challenge encountered roadblocks when the documentation of the code was written in the native language of the researchers which required translation and communication with the original authors.”
In collaboration with partners, FARR has been driving awareness and methods for AI reproducibility in climate informatics.
“Through partnerships with initiatives like CI2023, FARR gathers key input that the community can use to identify best practices in documenting their (AI) analysis methods for increasing AI reproducibility and confidence in the conclusions drawn from the results,” Kirkpatrick said. “By providing guidance and support, we are working to ensure that the results of climate informatics research are widely accessible, enabling the broader scientific community to replicate and build upon these findings.”
Share